AITopics

Country: Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsAug-20-2025, 04:10:24 GMT

Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution

Thang Vu, Hyunjun Jang, Trung X. Pham, Chang Yoo

RPN relies on a single anchor per location and performs multi-stage refinement.

anchor, artificial intelligence, machine learning, (15 more...)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

Pratap, Abhinav, Kumar, Sushant, Chakravarty, Suchinton

Adaptive Object Detection for Indoor Navigation Assistance: A Performance Evaluation of Real-Time Algorithms

arXiv.org Artificial IntelligenceJan-30-2025

-- This study addresses the critical need for accurate and efficient object detection in assistive technologies for visually impaired individuals. We systematically evaluate the performance of four prominent real-time object detection algorithms--YOLO, SSD, Faster R-CNN, and Mask R-CNN--within the context of indoor navigation assistance. Our analysis, conducted on the Indoor Objects Detection dataset, focuses on key parameters including detection accuracy, processing speed, and adaptability to the unique challenges of indoor environments. This research contributes to a deeper understanding of adaptive machine learning applications that can significantly improve indoor navigation solutions for the visually impaired, promoting inclusivity and accessibility. In today's technology-driven society, there is an increasing emphasis on enhancing accessibility for visually impaired individuals.

application, detection, faster r-cnn, (13 more...)

2501.18444

Country:

North America > United States > Minnesota > Blue Earth County > Mankato (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)
Asia > India > Maharashtra (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Neural Information Processing SystemsOct-11-2024, 09:39:50 GMT

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection.

r-cnn, real-time object detection, region proposal network, (5 more...)

Technology: Information Technology > Artificial Intelligence > Vision (0.70)

Neural Information Processing SystemsMar-12-2024, 21:00:01 GMT

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model [19], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.

artificial intelligence, machine learning, proposal, (18 more...)

Country: Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceOct-16-2023

Object Detection in Aerial Images in Scarce Data Regimes

Jeune, Pierre Le

Most contributions on Few-Shot Object Detection (FSOD) evaluate their methods on natural images only, yet the transferability of the announced performance is not guaranteed for applications on other kinds of images. We demonstrate this with an in-depth analysis of existing FSOD methods on aerial images and observed a large performance gap compared to natural images. Small objects, more numerous in aerial images, are the cause for the apparent performance gap between natural and aerial images. As a consequence, we improve FSOD performance on small objects with a carefully designed attention mechanism. In addition, we also propose a scale-adaptive box similarity criterion, that improves the training and evaluation of FSOD methods, particularly for small objects. We also contribute to generic FSOD with two distinct approaches based on metric learning and fine-tuning. Impressive results are achieved with the fine-tuning method, which encourages tackling more complex scenarios such as Cross-Domain FSOD. We conduct preliminary experiments in this direction and obtain promising results. Finally, we address the deployment of the detection models inside COSE's systems. Detection must be done in real-time in extremely large images (more than 100 megapixels), with limited computation power. Leveraging existing optimization tools such as TensorRT, we successfully tackle this engineering challenge.

large language model, machine learning, neural information processing system 33, (25 more...)

2310.10433

Country:

Europe > France (0.45)
North America > United States (0.45)
Europe > Serbia (0.14)
(5 more...)

Genre:

Summary/Review (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
(2 more...)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine (1.00)
(5 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(11 more...)

arXiv.org Artificial IntelligenceApr-4-2023

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA

Zhu, Yongxin, Liu, Zhen, Liang, Yukang, Li, Xin, Liu, Hao, Bao, Changcun, Xu, Linli

In this paper, we propose a novel multi-modal framework for Scene Text Visual Question Answering (STVQA), which requires models to read scene text in images for question answering. Apart from text or visual objects, which could exist independently, scene text naturally links text and visual modalities together by conveying linguistic semantics while being a visual object in an image simultaneously. Different to conventional STVQA models which take the linguistic semantics and visual semantics in scene text as two separate features, in this paper, we propose a paradigm of "Locate Then Generate" (LTG), which explicitly unifies this two semantics with the spatial bounding box as a bridge connecting them. Specifically, at first, LTG locates the region in an image that may contain the answer words with an answer location module (ALM) consisting of a region proposal network and a language refinement network, both of which can transform to each other with one-to-one mapping via the scene text bounding box. Next, given the answer words selected by ALM, LTG generates a readable answer sequence with an answer generation module (AGM) based on a pre-trained language model. As a benefit of the explicit alignment of the visual and linguistic semantics, even without any scene text based pre-training tasks, LTG can boost the absolute accuracy by +6.06% and +6.92% on the TextVQA dataset and the ST-VQA dataset respectively, compared with a non-pre-training baseline. We further demonstrate that LTG effectively unifies visual and text modalities through the spatial bounding box connection, which is underappreciated in previous methods.

machine learning, natural language, question answering, (17 more...)

2304.01603

Country:

Asia > China (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Chawda, Ajay, Vierling, Axel, Berns, Karsten

Dimensionality of datasets in object detection networks

arXiv.org Artificial IntelligenceOct-13-2022

In recent years, convolutional neural networks (CNNs) are used in a large number of tasks in computer vision. One of them is object detection for autonomous driving. Although CNNs are used widely in many areas, what happens inside the network is still unexplained on many levels. Our goal is to determine the effect of Intrinsic dimension (i.e. minimum number of parameters required to represent data) in different layers on the accuracy of object detection network for augmented data sets. Our investigation determines that there is difference between the representation of normal and augmented data during feature extraction.

artificial intelligence, machine learning, representation, (15 more...)

2210.07049

Country: Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.05)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.36)
Information Technology > Robotics & Automation (0.36)
Automobiles & Trucks (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Inder, James Mason, Lowell, Mark, Maltenfort, Andrew J.

Centerpoints Are All You Need in Overhead Imagery

arXiv.org Artificial IntelligenceOct-4-2022

Every day, observation satellites capture terabytes of imagery of the Earth's surface that feed into a wide variety of civil and military applications. This stream of data has grown so large that only automated methods can feasibly analyze it. One critical component of remote sensing analysis is object detection: locating objects of interest on the Earth's surface in overhead imagery. Automated object detection algorithms have advanced by leaps and bounds over the last decade, but they still require vast amounts of labeled data for training, which is expensive and tedious to produce. Any technique that can reduce the resources needed to label objects in overhead imagery is therefore desirable. Most existing datasets for training overhead object detectors are labeled with horizontal bounding boxes [1][2][3][4][5], object-aligned bounding boxes [6][7][8][9][10], or segmentation masks [11][12].

artificial intelligence, detection, machine learning, (14 more...)

2210.01857

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.82)

Industry:

Energy > Power Industry > Utilities (0.51)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

arXiv.org Artificial IntelligenceDec-8-2021

Geometry-Aware Fruit Grasping Estimation for Robotic Harvesting in Orchards

Kang, Hanwen, Wang, Xing, Chen, Chao

Field robotic harvesting is a promising technique in recent development of agricultural industry. It is vital for robots to recognise and localise fruits before the harvesting in natural orchards. However, the workspace of harvesting robots in orchards is complex: many fruits are occluded by branches and leaves. It is important to estimate a proper grasping pose for each fruit before performing the manipulation. In this study, a geometry-aware network, A3N, is proposed to perform end-to-end instance segmentation and grasping estimation using both color and geometry sensory data from a RGB-D camera. Besides, workspace geometry modelling is applied to assist the robotic manipulation. Moreover, we implement a global-to-local scanning strategy, which enables robots to accurately recognise and retrieve fruits in field environments with two consumer-level RGB-D cameras. We also evaluate the accuracy and robustness of proposed network comprehensively in experiments. The experimental results show that A3N achieves 0.873 on instance segmentation accuracy, with an average computation time of 35 ms. The average accuracy of grasping estimation is 0.61 cm and 4.8$^{\circ}$ in centre and orientation, respectively. Overall, the robotic system that utilizes the global-to-local scanning and A3N, achieves success rate of harvesting ranging from 70\% - 85\% in field harvesting experiments.

accuracy, estimation, segmentation, (13 more...)

2112.04363

Country: Oceania > Australia (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Food & Agriculture > Agriculture (0.96)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)